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A.  Scientific  Research  Goals 

This  report  represents  the  status  of  the  project  referenced  above  as  of  June  30,  1995, 
the  conclusion  of  the  term  of  this  grant. 

The  project  centered  on  a  problem  in  structural  acoustics:  to  determine  the  reflec¬ 
tion  properties  of  an  underwater  object  interacting  with  sonar  signals.  To  generate 
representative  problems,  we  have  been  using  the  SARA  package  produced  by  H.  Al- 
lik  and  D.  Moore  of  Bolt  Beranek  and  Newman,  New  London,  CT.  This  code  uses  a 
finite/infinite  (quadratic)  element  formulation  in  the  frequency  domain.  The  resulting 
linear  system,  which  is  complex  symmetric,  is  currently  solved  with  heavy  dependence  on 
secondary  storage  (i.e.,  disk)  by  Gaussian  elimination  using  a  frontal  approach,  with  no 
pivoting  for  stability.  This  method  has  given  adequate  numerical  results,  even  though 
it  might  be  expected  to  fail  due  to  resonances  in  the  system.  It  is  also  likely  to  be 
too  costly  to  handle  large  problems,  especially  for  three  dimensional  models.  Our  goal 
has  been  to  investigate  two  alternative  approaches  that  we  expect  will  alleviate  these 
difficulties:  the  use  of  iterative  methods  and  the  use  of  parallel  sparse  solvers. 

The  exploratory  results  in  this  report  will  serve  as  the  basis  for  work  in  the  second 
phase  of  the  project. 

B.  Significant  Results 

B.i.  Use  of  Iterative  Methods:  Elman.  O’Leary,  Stewart 

This  research  was  motivated  by  previous  experience  suggesting  that  although  direct 
methods  are  highly  efficient  for  two-dimensional  problems,  a  well-chosen  iterative  method 
can  provide  substantial  computational  improvement  over  direct  solution  on  three-dimensional 
problems.  This  must  be  tempered  by  the  fact  that  little  is  known  about  the  behav¬ 
ior  of  iterative  methods  on  systems  such  as  these,  even  for  one-dimensional  and  two- 
dimensional  problems. 

Our  investigations  are  focused  on  the  potential  of  various  algorithms  as  alternatives 
to  direct  solution  of  the  linear  system.  We  experimented  with  two  iterative  methods: 

1.  The  QMR  algorithm  of  Freund  and  Nachtigal. 


2.  The  LSQR  algorithm  of  Paige  and  Saunders. 

The  original  formulation  of  the  problem  leads  to  a  graded  matrix,  in  which  the  first 
rows  are  much  larger  in  magnitude  than  the  last.  Therefore,  we  allowed  two  different 
scalings: 

1.  Working  with  the  original  matrix  A. 

2.  Working  with  the  matrix  A  =  D*AD,  where  D  is  a  diagonal  matrix  chosen  to 
make  the  diagonal  elements  of  A  equal  to  1. 

Several  preconditioners  M  were  considered. 

1.  No  preconditioning:  M  =  I. 

2.  Preconditioning  by  the  diagonal  blocks  corresponding  to  the  structure,  the  water, 
and  the  boundary  between  the  two. 

3.  Preconditioning  by  a  banded  matrix  matching  some  of  the  elements  of  the  diagonal 
blocks  of  the  original  matrix 

4.  The  Huckle-Grote  preconditioner,  which  constructs  a  sparse  approximate  inverse 
of  A 

Preconditioning  can  be  performed  on  either  the  left  (corresponding  to  a  rescaling  of  the 
equations)  or  on  the  right  (corresponding  to  a  change  of  variables).  Right  precondition¬ 
ing  has  been  moderately  more  successful  in  our  tests. 

The  bulk  of  our  experiments  were  performed  with  two  BBN  test  problems:  one  with 
1882  degrees  of  freedom  corresponding  to  a  100  Hz  plane  wave  incident  to  a  flexible 
sphere  of  radius  1  meter  and  thickness  1  cm;  and  the  other  with  3649  degrees  of  freedom 
derived  from  a  point  load  on  a  submerged  flexible  cylinder  with  spherical  endcaps,  where 
the  cylinder  radius  had  1  meter,  length  of  cylinder  3  meters,  and  wall  thickness  1  cm. 

We  have  had  great  success  in  using  the  diagonal  block  preconditioner.  The  residual 
norm  is  reduced  by  a  factor  of  10-4  in  fewer  than  10  QMR  iterations,  while  the  rela¬ 
tive  error  in  the  solution  drops  by  a  comparable  factor.  This  algorithm  is  the  iterative 
equivalent  of  domain  decomposition ,  in  which  linear  systems  approximating  partial  dif¬ 
ferential  equations  are  solved  in  each  of  two  independent  subdomains,  one  corresponding 
to  the  structure  and  one  corresponding  to  the  water,  and  the  results  are  then  “glued” 
together  by  the  iterative  method.  This  success  in  decoupling  the  structure  from  the 
water  opens  the  possibility  we  can  use  different,  specially  tailored  iterative  methods  on 
each  part. 

We  have  also  studied  the  algebraic  structure  of  the  matrix  to  investigate  whether  a 
preconditioner  that  involved  only  the  real  part  of  the  matrix  (ignoring  the  imaginary 


components)  or  a  banded  portion  of  the  matrix  would  be  successful.  The  preconditioner 
corresponding  to  the  real  part  of  the  matrix  is  not  adequate,  but  a  banded  preconditioner 
can  be  successful  if  done  carefully.  For  example,  talcing  90%  of  each  diagonal  block  does 
not  lead  to  a  practical  algorithm,  but  convergence  can  be  obtained  by  using  100%  of 
both  the  block  for  the  structure  and  the  coupling  block  between  elements  in  the  water 
and  those  in  the  structure,  together  with  70%  of  the  block  for  the  water  elements. 

Using  incomplete  factorization  as  a  preconditioner  did  not  prove  to  be  economical. 
We  also  showed  that  there  is  no  scaling  of  the  matrix  that  makes  all  nonzero  elements 
of  comparable  size. 

Domain  decomposition  on  the  water  blocks  produced  slow  convergence  for  nonover¬ 
lapping  blocks  and  no  convergence  with  overlapping  blocks.  An  SP2  implementation 
was  completed. 

In  other  work  on  parallel  computers,  we  have  performed  some  tests  of  iterative  meth¬ 
ods  on  the  Connection  Machine  5,  for  the  Helmholtz  equation  with  radiation  boundary 
conditions.  This  problem  is  similar  in  form  to  the  block  subproblems  on  the  individual 
subdomains.  Results  for  two-dimensional  problems  indicate  that  iteration  counts  for 
problems  on  an  n  x  n  grid  are  proportional  to  n,  somewhat  better  than  expected  for  an 
indefinite  problem. 

We  also  developed  methods  to  exploit  the  parametric  nature  of  the  problem.  Exper¬ 
iments  with  the  2-dimensional  Helmholtz  equation  showed  that  a  preconditioner  based 
on  a  fixed  frequency  could  be  effectively  used  for  a  whole  range  of  nearby  frequencies. 

B.2.  Parallel  Sparse  Factorization:  Saltz 

This  work  concerned  the  parallel  solution  of  the  finite  element  equations  by  a  parallel 
sparse  direct  solver.  The  single  processor  equivalent  is  the  SARA  program,  which  is 
well  implemented  and  probably  not  subject  to  much  improvement.  Consequently,  any 
speedup  in  the  direct  solution  of  the  system  must  come  from  the  ability  to  use  more 
than  one  processor. 

We  have  developed  (in  collaboration  with  the  IBM  Thomas  J.  Watson  Research 
Center)  a  sparse  Cholesky  code  that  exploits  the  dense  structures  within  the  sparse 
matrix  to  obtain  good  single  processor  performance.  We  have  extended  it  to  a  variety 
of  distributed  memory  architectures.  Our  approach  partitions  the  sparse  matrix  into 
regular  chunks  of  nonzeros,  which  we  call  a  block.  Since  the  blocks  have  very  regular 
internal  nonzero  structure,  a  significant  part  of  the  numerical  computations  can  be 
carried  out  using  efficient  dense  matrix  kernels.  We  can  therefore  make  use  of  Level-3 
BLAS  subroutines,  which  are  highly  optimized  for  modern  architectures,  to  carry  out 
the  numerical  computations. 

For  acoustics  problems,  we  have  produced  a  complex  version  of  this  solver  and  tested 
the  code  with  the  two  problems  generated  by  SARA  discussed  in  Section  B.l  as  well  as 


a  variant  of  the  second  problem,  on  both  an  Intel  Paragon  and  a  Cray  T3D.  A  summary 
of  the  megaflop  rates  achieved  in  these  tests  (using  double  precision  arithmetic)  is  given 
below. 
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